Extending Feature Decay Algorithms Using Alignment Entropy
نویسندگان
چکیده
In machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. Feature Decay Algorithms (FDA) have demonstrated excellent performance in a number of tasks. While the decay function is at the heart of the success of FDA, its parameters are initialised with the same weights. In this paper, we investigate the effect on Machine Translation of assigning more appropriate weights to words using word-alignment entropy. In experiments on German to English, we show the effect of calculating these weights using two popular alignment methods, GIZA++ and FastAlign, using both automatic and human evaluations. We demonstrate that our novel FDA model is a promising research direction.
منابع مشابه
Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms
Data Selection is a popular step in Machine Translation pipelines. Feature Decay Algorithms (FDA) is a technique for data selection that has shown a good performance in several tasks. FDA aims to maximize the coverage of n-grams in the test set. However, intuitively, more ambiguous n-grams require more training examples in order to adequately estimate their translation probabilities. This ambig...
متن کاملImage alignment via kernelized feature learning
Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...
متن کاملSentence Alignment Method Based on Maximum Entropy Model Using Anchor Sentences
The paper proposes a sentence alignment method based on maximum entropy model using anchor sentences to align ancient and modern Chinese sentences in historical classics. The method selects the sentence pairs with the same phrases at the beginning or the end of the sentence or with the same time phrases as anchor sentence pairs, which are employed to divide the paragraph into several sections. ...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملAutomatic colposcopy video tissue classification using higher order entropy-based image registration
Colposcopy is a well-established method to detect and diagnose intraepithelial lesions and uterine cervical cancer in early stages. During the exam color and texture changes are induced by the application of a contrast agent (e.g.3-5% acetic acid solution or iodine). Our aim is to densely quantify the change in the acetowhite decay level for a sequence of images captured during a colposcopy exa...
متن کامل